SL Paper 1

At an early stage in analysing the marks scored by candidates in an examination paper, the examining board takes a random sample of 250 candidates and finds that the marks, \(x\) , of these candidates give \(\sum {x = 10985} \) and \(\sum {{x^2} = 598736} \).

Calculate a 90% confidence interval for the population mean mark μ for this paper.

[4]
a.

The null hypothesis μ = 46.5 is tested against the alternative hypothesis μ < 46.5 at the λ% significance level. Determine the set of values of λ for which the null hypothesis is rejected in favour of the alternative hypothesis.

[4]
b.

Markscheme

\(\bar x = 43.94\)      (A1)

unbiased variance estimate = 466.0847        (A1)

Note: Accept sample variance = 464.2204.

⇒ 90% confidence interval is (41.7,46.2)       A1A1

[4 marks]

a.

Z-value is −1.87489 or −1.87866       (A1)

probability is 0.0304 or 0.0301      (A1)

λ ≥ 3.01       (M1)A1

[4 marks]

b.

Examiners report

[N/A]
a.
[N/A]
b.



A random sample \({X_1},{\text{ }}{X_2},{\text{ }} \ldots ,{\text{ }}{X_n}\) is taken from the normal distribution \({\text{N}}(\mu ,{\text{ }}{\sigma ^2})\), where the value of \(\mu \) is unknown but the value of \({\sigma ^2}\) is known. The mean of the sample is denoted by \(\bar X\).

A mathematics teacher, wishing to apply the above result, generates some artificial data, assumes a value for the variance and calculates the following 95% confidence interval for \(\mu \),

\[[3.12,{\text{ }}3.25].\]

The teacher asks Alun to interpret this result. Alun makes the following statement. “The value of \(\mu \) lies in the interval \([3.12,{\text{ }}3.25]\) with probability 0.95.”

State the distribution of \(\frac{{\bar X - \mu }}{{\frac{\sigma }{{\sqrt n }}}}\).

[1]
a.i.

Hence show that, with probability 0.95,

\[\bar X - 1.96\frac{\sigma }{{\sqrt n }} \leqslant \mu  \leqslant \bar X + 1.96\frac{\sigma }{{\sqrt n }}.\]

[4]
a.ii.

Explain briefly why this is an incorrect statement.

[1]
b.i.

Give a correct interpretation.

[1]
b.ii.

Markscheme

\(\frac{{\bar X - \mu }}{{\frac{\sigma }{{\sqrt n }}}}\) is \({\text{N}}(0,{\text{ }}1)\) or it has the Z-distribution A1

[??? marks]

a.i.

attempt to make a probability statement     R1

therefore with probability 0.95,

\( - 1.96 \leqslant \frac{{\bar X - \mu }}{{\frac{\sigma }{{\sqrt n }}}} \leqslant 1.96\)     A1

\( - 1.96\frac{\sigma }{{\sqrt n }} \leqslant \bar X - \mu  \leqslant 1.96\frac{\sigma }{{\sqrt n }}\)     A1

\(1.96\frac{\sigma }{{\sqrt n }} \geqslant \mu  - \bar X \geqslant  - 1.96\frac{\sigma }{{\sqrt n }}\)     A1

\(\bar X + 1.96\frac{\sigma }{{\sqrt n }} \geqslant \mu  \geqslant \bar X - 1.96\frac{\sigma }{{\sqrt n }}\)

 

Note:     Award the final A1 for either of the above two lines.

 

\(\bar X - 1.96\frac{\sigma }{{\sqrt n }} \leqslant \mu  \leqslant \bar X + 1.96\frac{\sigma }{{\sqrt n }}\)     AG

[??? marks]

a.ii.

you cannot make a probability statement about a constant lying in a constant interval OR the mean either lies in the interval or it does not     A1

[1 mark]

b.i.

the confidence interval is the observed value of a random interval

OR if the process is carried out a large number of times, \(\mu \) will lie in the interval 95% of the times     A1

[1 mark]

b.ii.

Examiners report

[N/A]
a.i.
[N/A]
a.ii.
[N/A]
b.i.
[N/A]
b.ii.



The lifetime, in years, of a randomly chosen basic vacuum cleaner is assumed to be modelled by the normal distribution \(B \sim {\text{N}}(14,{\text{ }}{3^2})\).

The lifetime, in years, of a randomly chosen robust vacuum cleaner is assumed to be modelled by the normal distribution \(R \sim {\text{N}}(20,{\text{ }}{4^2})\).

Find \({\text{P}}\left( {B > {\text{E}}(B) + \frac{1}{2}\sqrt {{\text{Var}}(B)} } \right)\).

[2]
a.

Find the probability that the total lifetime of 7 randomly chosen basic vacuum cleaners is less than 100 years.

[4]
b.

Find the probability that the total lifetime of 5 randomly chosen robust vacuum cleaners is greater than the total lifetime of 7 randomly chosen basic vacuum cleaners.

[5]
c.

Markscheme

\({\text{P}}(B > 15.5){\text{ }}\left( { = {\text{P}}(Z > 0.5)} \right)\)    (M1)

\( = (1 - 0.69146) = 0.309\)    A1

[2 marks]

a.

consider \(V = {B_1} + {B_2} + {B_3} + {B_4} + {B_5} + {B_6} + {B_7}\)     (M1)

\({\text{E}}(V) = 98\)    (A1)

\({\text{Var}}(V) = 63\) or equivalent     (A1)

 

Note:     No need to state \(V\) is normal.

 

\({\text{P}}(V < 100) = \left( {{\text{P}}\left( {Z < \frac{2}{{\sqrt {63} }} = 0.251976 \ldots } \right)} \right) = 0.599\)    A1

[4 marks]

b.

consider \(W = {R_1} + {R_2} + {R_3} + {R_4} + {R_5} - ({B_1} + {B_2} + {B_3} + {B_4} + {B_5} + {B_6} + {B_7})\)     (M1)

\({\text{E}}(W) = 2\)    (A1)

\({\text{Var}}(W) = 80 + 63 = 143\)    (A1)

\({\text{P}}(W > 0) = \left( {{\text{P}}\left( {Z < \frac{2}{{\sqrt {143} }}} \right)} \right)\)    (M1)

\( = 0.566\)    A1

[5 marks]

c.

Examiners report

This was one of the more successful questions on the paper with many wholly correct answers seen. Only a very small number failed to complete part (a) successfully. There were also many fully correct answers to part (b). Part (c) caused a problem for some candidates where in most of those cases they failed to calculate the variance correctly.

a.

This was one of the more successful questions on the paper with many wholly correct answers seen. Only a very small number failed to complete part (a) successfully. There were also many fully correct answers to part (b). Part (c) caused a problem for some candidates where in most of those cases they failed to calculate the variance correctly.

b.

This was one of the more successful questions on the paper with many wholly correct answers seen. Only a very small number failed to complete part (a) successfully. There were also many fully correct answers to part (b). Part (c) caused a problem for some candidates where in most of those cases they failed to calculate the variance correctly.

c.



The weights of male students in a college are modelled by a normal distribution with mean 80 kg and standard deviation 7 kg.

The weights of female students in the college are modelled by a normal distribution with mean 54 kg and standard deviation 5 kg.

The college has a lift installed with a recommended maximum load of 550 kg. One morning, the lift contains 3 male students and 6 female students. You may assume that the 9 students are randomly chosen.

Find the probability that the weight of a randomly chosen male student is more than twice the weight of a randomly chosen female student.

[6]
a.

Determine the probability that their combined weight exceeds the recommended maximum.

[5]
b.

Markscheme

let \(M\), \(F\) denote the weights of the male, female

consider \(D = M - 2F\)     (M1)

\({\text{E}}(D) = 80 - 2 \times 54 =  - 28\)     A1

\({\text{Var}}(D) = {7^2} + 4 \times {5^2}\)     (M1)

\( = 149\)     A1

\({\text{P}}(M > 2F) = {\text{P}}(D > 0)\)     (M1)

\( = 0.0109\)     A1

 

Note:     Accept any answer that rounds correctly to 0.011.

 

[6 marks]

a.

consider \({\text{S}} = \sum\limits_{i = 1}^3 {{M_i} + \sum\limits_{i = 1}^6 {{F_i}} } \)     (M1)

 

Note:     Condone the use of the incorrect notation \(3M + 6F\).

\({\text{E}}(S) = 3 \times 80 + 6 \times 54 = 564\)     A1

\({\text{Var}}(S) = 3 \times {7^2} + 6 \times {5^2}\)     (M1)

\( = 297\)     A1

\({\text{P}}(S > 550) = 0.792\)     A1

 

Note:     Accept any answer that rounds correctly to 0.792.

 

[5 marks]

b.

Examiners report

[N/A]
a.
[N/A]
b.



Jim is investigating the relationship between height and foot length in teenage boys.

A sample of 13 boys is taken and the height and foot length of each boy are measured.

The results are shown in the table.

You may assume that this is a random sample from a bivariate normal distribution.

Jim wishes to determine whether or not there is a positive association between height and foot length.

Calculate the product moment correlation coefficient.

[2]
a.

Find the \(p\)-value.

[2]
b.

Interpret the \(p\)-value in the context of the question.

[1]
c.

Find the equation of the regression line of \(y\) on \(x\).

[2]
d.

Estimate the foot length of a boy of height 170 cm.

[2]
e.

Markscheme

Note: In all parts accept answers which round to the correct 2sf answer.

 

\(r = 0.806\)     A2

a.

\(4.38 \times {10^{ - 4}}\)     A2

b.

\(p\)-value represents strong evidence to indicate a (positive) association between height and foot length     A1

 

Note: FT the \(p\)-value

c.

\(y = 0.103x + 12.3\)     A2

d.

attempted substitution of \(x = 170\)     (M1)

\(y = 29.7\)     A1

Note: Accept \(y = 29.8\)

e.

Examiners report

Solutions to this question were often disappointing. Candidates were expected to use appropriate software on their calculators to do the whole question. However, some candidates used their calculators just to evaluate sums and sums of squares and then used the appropriate formulae to calculate the correlation coefficient, the p-value (which required the evaluation of the t-value first) and the equation of the regression line. This was a time consuming exercise and introduced the possibility of arithmetic error.

a.

Solutions to this question were often disappointing. Candidates were expected to use appropriate software on their calculators to do the whole question. However, some candidates used their calculators just to evaluate sums and sums of squares and then used the appropriate formulae to calculate the correlation coefficient, the p-value (which required the evaluation of the t-value first) and the equation of the regression line. This was a time consuming exercise and introduced the possibility of arithmetic error.

b.

Solutions to this question were often disappointing. Candidates were expected to use appropriate software on their calculators to do the whole question. However, some candidates used their calculators just to evaluate sums and sums of squares and then used the appropriate formulae to calculate the correlation coefficient, the p-value (which required the evaluation of the t-value first) and the equation of the regression line. This was a time consuming exercise and introduced the possibility of arithmetic error.

c.

Solutions to this question were often disappointing. Candidates were expected to use appropriate software on their calculators to do the whole question. However, some candidates used their calculators just to evaluate sums and sums of squares and then used the appropriate formulae to calculate the correlation coefficient, the p-value (which required the evaluation of the t-value first) and the equation of the regression line. This was a time consuming exercise and introduced the possibility of arithmetic error.

d.

Solutions to this question were often disappointing. Candidates were expected to use appropriate software on their calculators to do the whole question. However, some candidates used their calculators just to evaluate sums and sums of squares and then used the appropriate formulae to calculate the correlation coefficient, the p-value (which required the evaluation of the t-value first) and the equation of the regression line. This was a time consuming exercise and introduced the possibility of arithmetic error.

e.



Bill is investigating whether or not there is a positive association between the heights and weights of boys of a certain age. He defines the hypotheses\[{{\rm{H}}_0}:\rho  = 0;{{\rm{H}}_1}:\rho  > 0 ,\]where \(\rho \) denotes the population correlation coefficient between heights and weights of boys of this age. He measures the height, \(h\) cm, and weight, \(w\) kg, of each of a random sample of \(20\) boys of this age and he calculates the following statistics.\[\sum {w = 340,\sum {h = 2002,\sum {{w^2} = 5830} } } ,\sum {{h^2} = 201124} ,\sum {hw = 34150} \]

(i)     Calculate the correlation coefficient for this sample.

(ii)     Calculate the \(p\)-value of your result and interpret it at the \(1\% \) level of significance.

[8]
a.

(i)     Calculate the equation of the least squares regression line of \(w\) on \(h\) .

(ii)     The height of a randomly selected boy of this age of \(90\) cm. Estimate his weight.

[3]
b.

Markscheme

(i)     \(r = \frac{{34150 - 340 \times \frac{{2002}}{{20}}}}{{\sqrt {\left( {5830 - \frac{{{{340}^2}}}{{20}}} \right)} \left( {201124 - \frac{{{{2002}^2}}}{{20}}} \right)}}\)     (M1)(A1)

Note: Accept equivalent formula.

 

\( = 0.610\)     A1

 

(ii)     (\(T = R \times \sqrt {\frac{{n - 2}}{{1 - {R^2}}}} \) has the t-distribution with \(n - 2\) degrees of freedom)

\(t = 0.6097666 \ldots \sqrt {\frac{{18}}{{1 - 0.6097666{ \ldots ^2}}}} \)     M1

\( = 3.2640 \ldots \)     A1

\({\rm{DF}} = 18\)     A1

\(p{\rm{ - value}} = 0.00215 \ldots \)     A1

this is less than \(0.01\), so we conclude that there is a positive association between heights and weights of boys of this age     R1

 

[8 marks]

a.

(i)     the equation of the regression line of \(w\) on \(h\) is

\(w - \frac{{340}}{{20}} = \left( {\frac{{20 \times 34150 - 340 \times 2002}}{{20 \times 201124 - {{2002}^2}}}} \right)\left( {h - \frac{{2002}}{{20}}} \right)\)     M1

\(w = 0.160h + 0.957\)     A1

 

(ii) putting \(h = 90\) , \(w = 15.4\) (kg)     A1

Note: Award M0A0A0 for calculation of \(h\) on \(w\).

 

[3 marks]

b.

Examiners report

[N/A]
a.
[N/A]
b.



The weights of potatoes in a shop are normally distributed with mean \(98\) grams and standard deviation \(16\) grams.

The shopkeeper places \(100\) randomly chosen potatoes on a weighing machine. Find the probability that their total weight exceeds \(10\) kilograms.

[3]
a.

Find the minimum number of randomly selected potatoes which are needed to ensure that their total weight exceeds \(10\) kilograms with probability greater than \(0.95\).

[8]
b.

Markscheme

let \(T\) denote the total weight, then

\(T \sim N(9800,25600)\)     (M1)(A1)

\({\rm{P}}(T > 10000) = 0.106\)     A1

[3 marks]

a.

let there be \(n\) potatoes, in this case,

\(T \sim {\rm{N}}(98n,256n)\)     A1

we require

\({\rm{P}}(T > 10000) > 0.95\)     (M1)

or equivalently

\({\rm{P}}(T \le 10000) < 0.05\)     A1

standardizing,

\({\rm{P}}\left( {Z \le \frac{{10000 - 98n}}{{16\sqrt n }}} \right) < 0.05\)     A1

\(\frac{{10000 - 98n}}{{16\sqrt n }} < - 1.6449 \ldots \)     (A1)

\(98n - 26.32\sqrt n  - 10000 > 0\)     A1

solving the corresponding equation, \(n = 104.7 \ldots \)     (A1)

the required minimum value is \(105\)     A1

Note: Part (b) could also be solved using SOLVER and normalcdf, or by trial and improvement.

Note: Allow the use of \( = \) instead of \( < \) and \( > \) throughout.

[8 marks]

b.

Examiners report

[N/A]
a.
[N/A]
b.



The mean weight of a certain breed of bird is claimed to be 5.5 kg. In order to test this claim, a random sample of 10 birds of the breed was obtained and weighed, with the following results in kg.

\[5.41\quad \quad \quad 5.22\quad \quad \quad 5.54\quad \quad \quad 5.58\quad \quad \quad 5.20\quad \quad \quad 5.57\quad \quad \quad 5.23\quad \quad \quad 5.32\quad \quad \quad 5.46\quad \quad \quad 5.37\]

You may assume that the weights of this breed of bird are normally distributed.

State suitable hypotheses for testing the above claim using a two-tailed test.

[1]
a.

Calculate unbiased estimates of the mean and the variance of the weights of this breed of bird.

[4]
b.

Determine the \(p\)-value of the above data.

[4]
c.i.

State whether or not the claim is supported by the data, using a significance level of 5%.

[1]
c.ii.

Markscheme

\({H_0}:\mu  = 5.5;{\text{ }}{H_1}:\mu  \ne 5.5\)     A1

[1 mark]

a.

\(\sum {x = 53.9,{\text{ }}\hat \mu  = 5.39} \)     (M1)A1

\(\sum {{x^2} = 290.7132,{\text{ }}{{\hat \sigma }^2} = 0.0214} \)     (M1)A1

 

Note:     Accept any answer that rounds correctly to 0.021.

 

[4 marks]

b.

attempt to use the \(t\)-test     (M1)

\(t =  - 2.38{\text{ }}({\text{Accept }} + 2.38)\)     (A1)

\({\text{DF}} = 9\)     (A1)

\(p{\text{ - value}} = 0.0412\)     A1

[??? marks]

c.i.

the claim is not supported (not accepted, rejected) at the 5% level of significance     A1

[??? marks]

c.ii.

Examiners report

[N/A]
a.
[N/A]
b.
[N/A]
c.i.
[N/A]
c.ii.



The weights, \(X\) kg , of male birds of a certain species are normally distributed with mean \(4.5\) kg and standard deviation \(0.2\) kg . The weights, \(Y\) kg , of female birds of this species are normally distributed with mean \(2.5\) kg and standard deviation \(0.15\) kg .

(i)     Find the mean and variance of \(2Y - X\) .

(ii)     Find the probability that the weight of a randomly chosen male bird is more than twice the weight of a randomly chosen female bird.

[6]
a.

Two randomly chosen male birds and three randomly chosen female birds are placed together on a weighing machine for which the recommended maximum weight is \(16\) kg . Find the probability that this maximum weight is exceeded.

[5]
b.

Markscheme

(i)     \({\rm{E}}(2Y - X) = 2 \times 2.5 - 4.5 = 0.5\)     A1

\(Var(2Y - X) = 4 \times 0.1{5^2} + {0.2^2} = 0.13\)     M1A1

 

(ii)     We require \({\rm{P}}(X > 2Y) = {\rm{P}}(2Y - X < 0)\)     M1

\(0.0828\)     A2

Note: Using tables, answer is \(0.0823\).

 

[6 marks]

a.

Let \(S\) denote the total weight of the \(5\) birds.

Then,

\({\rm{E}}(S) = 2 \times 4.5 + 3 \times 2.5 = 16.5\)     A1

\(Var(S) = 2 \times 0.{2^2} + 3 \times 0.1{5^2} = 0.1475\)    M1A1

\({\rm{P}}(S > 16) = 0.904\)     A2

Note: Using tables, answer is \(0.903\).

[5 marks]

b.

Examiners report

[N/A]
a.
[N/A]
b.



Sarah is the quality control manager for the Stronger Steel Corporation which makes steel sheets. The steel sheets should have a mean tensile strength of 430 MegaPascals (MPa). If the mean tensile strength drops to 400 MPa, then Sarah must recommend a change in composition. The tensile strength of these steel sheets follows a normal distribution with a standard deviation of 35 MPa. Sarah defines the following hypotheses

\[{H_0}:\mu  = 430\]

\[{H_1}:\mu  = 400\]

where \(\mu \) denotes the mean tensile strength in MPa. She takes a random sample of \(n\) steel sheets and defines the critical region as \(\bar x \leqslant k\), where \(\bar x\) notes the mean tensile strength of the sample in MPa and \(k\) is a constant.

Given that the \(P{\text{(Type I Error)}} = 0.0851\) and \(P{\text{(Type II Error)}} = 0.115\), both correct to three significant figures, find the value of \(k\) and the value of \(n\).

Markscheme

\(\bar X \sim N\left( {430,{\text{ }}\frac{{{{35}^2}}}{n}} \right)\)     (M1)(A1)

Note: The M1 is for considering the distribution of \(\bar X\)

 

type I error gives \({\text{P}}(\bar X \leqslant k/\mu  = 430) = 0.0851\)

\(\frac{{k - 430}}{{\frac{{35}}{{\sqrt n }}}} =  - 1.37156 \ldots \)     M1A1

type II error gives \({\text{P}}(\bar X > k/\mu  = 400) = 0.115\)

\(\frac{{k - 400}}{{\frac{{35}}{{\sqrt n }}}} = 1,20035 \ldots \)     M1A1

Note: The two M1 marks above are for attempting to standardize \({\bar X}\) and obtain the corresponding equations with inverse normal values

 

solving simultaneously     (M1)

\(k = 414\)     A1

\(n = 9\)     A1

Examiners report

This proved to be a difficult question for most candidates with only a minority giving a correct solution. Most candidates either made no attempt at the question or just wrote several lines of irrelevant mathematics.




Bottles of iced tea are supposed to contain 500 ml. A random sample of 8 bottles was selected and the volumes measured (in ml) were as follows:

497.2, 502.0, 501.0, 498.6, 496.3, 499.1, 500.1, 497.7 .

  (i)     Calculate unbiased estimates of the mean and variance.

  (ii)     Test at the \(5\%\) significance level the null hypothesis \({{\rm{H}}_0}:\mu  = 500\) against the alternative hypothesis \({{\rm{H}}_1}:\mu  < 500\) .

[5]
a.

A random sample of size four is taken from the distribution N(60, 36) .

Calculate the probability that the sum of the sample values is less than 250.

[6]
b.

Markscheme

(i)     497.2, 502.0, 501.0, 498.6, 496.3, 499.1, 500.1, 497.7

using the GDC

\(\overline x  = 499.0\) , \({\sigma ^2} = 3.8(0)\)     A1A1

Note: Accept \(499\).

 

(ii)     EITHER

\(p\)-value = 0.0950     A1

since \(0.0950 > 0.05\) accept \({H_0}\)     R1A1

OR

\({t_{calc}} = - 1.45\) , \({t_{critical}} = - 1.895\) for \(v = 7\) at 5 % level     A1

since \({t_{calc}} > {t_{critical}}\) accept \({H_0}\)     R1A1

 

[5 marks]

a.

each \(X \sim {\rm{N}}(60,36)\) so \(\sum\limits_{n = 1}^4 {{X_n} \sim {\rm{N}}(4(60),4(36)) = {\rm{N}}(240,144)} \)     M1A1A1

\({\rm{Pr}}({\rm{Sum}} < 250) = {\rm{Pr}}\left( {z < \frac{{250 - 240}}{{12}} = \frac{5}{6}} \right)\)     (M1)(A1)

\( = 0.798\) (by GDC)     A1

Notes: Accept \(0.797\) (tables).

    Answer only is awarded M0A0A0(M1)(A1)A1.

[6 marks]

b.

Examiners report

(a)(i) Very few mistakes were made in this question, although sometimes variance and standard deviation were confused. Why both variance and standard deviation are needed might be something that teachers could explore.

(ii) Again there were no serious problems although some candidates fail to show all the important parameters such as degrees of freedom.

a.

This was found to be relatively straightforward except for using the correct variance of \(144\). It would be useful here to make clear the distinction between the sum of random variables and a multiple of a random variable.

b.



The discrete random variables \({X_n},{\text{ }}n \in {\mathbb{Z}^ + }\) have probability generating functions given by \({G_n}(t) = \frac{t}{n}\left( {\frac{{{t^n} - 1}}{{t - 1}}} \right)\).

Let \({X_{n - 1}}\) and \({X_{n + 1}}\) be independent.

Use the formula for the sum of a finite geometric series to show that

\[{\text{P}}({X_n} = k) = \left\{ {\begin{array}{*{20}{l}} {\frac{1}{n}}&{{\text{for }}1 \leqslant k \leqslant n} \\ 0&{{\text{otherwise}}} \end{array}.} \right.\]

[4]
a.

Find \({\text{E}}({X_n})\).

[3]
b.

Find the set of values of \(n\) for which \({\text{E}}({X_{n - 1}} \times {X_{n + 1}}) < 2n\).

[4]
c.

Markscheme

using \(\left( {\frac{{{t^n} - 1}}{{t - 1}}} \right) = 1 + t + {t^2} +  \ldots {t^{n - 1}}\)     M1

\({G_n}(t) = 0 + \frac{t}{n} + \frac{{{t^2}}}{n} + \frac{{{t^3}}}{n} +  \ldots \frac{{{t^n}}}{n} + 0 \times {t^{n + 1}} + 0 \times  \ldots \)    A1A1

 

Note:     A1 for the non-zero terms, A1 for the observation that all other terms are zero.

 

the statement that the coefficient of \({t^k}\) gives \({\text{P}}({X_n} = k)\)     R1

hence \({\text{P}}({X_n} = k) = \left\{ {\begin{array}{*{20}{l}} {\frac{1}{n}}&{{\text{for }}1 \leqslant k \leqslant n} \\ 0&{{\text{otherwise}}} \end{array}} \right.\)       AG

[4 marks]

a.

\({\text{E}}({X_n}) = 0 \times 0 + 1 \times \frac{1}{n} + 2 \times \frac{1}{n} + 3 \times \frac{1}{n} +  \ldots n \times \frac{1}{n} + (n + 1) \times 0 +  \ldots  \times 0\)     (M1)(A1)

\( = \frac{1}{n} \times \sum\limits_{k = 1}^{k = n} k \)

\( = \frac{1}{n} \times \frac{1}{2}n(n + 1) = \frac{{n + 1}}{2}\)    A1

 

Note:     Accept use of \(G'(1)\).

 

[3 marks]

b.

\({X_{n - 1}}\) and \({X_{n + 1}}\) are independent \( \Rightarrow {\text{E}}({X_{n - 1}} \times {X_{n + 1}}) = {\text{E}}({X_{n - 1}}) \times {\text{E}}({X_{n + 1}})\)     M1

\( = \frac{n}{2} \times \frac{{n + 2}}{2}\)  A1

required to solve \({n^2} < 6n{\text{ }}({\text{or }}n + 2 < 8)\)     M1

solution: \((2 \leqslant ){\text{ }}n < 6\)     A1

[4 marks]

c.

Examiners report

Again this was a question that tested candidates and although many started only a very limited number were able to make significant progress. Part (a) was rarely done well with most candidates not understanding what was required. There was a little more success with part (b) but a number of candidates attempted methods that were not going to lead to anything meaningful. Most candidates did not understand what was required from part (c) and few correct answers were seen, even taking into account the fact that follow through marks could be awarded from (b).

a.

Again this was a question that tested candidates and although many started only a very limited number were able to make significant progress. Part (a) was rarely done well with most candidates not understanding what was required. There was a little more success with part (b) but a number of candidates attempted methods that were not going to lead to anything meaningful. Most candidates did not understand what was required from part (c) and few correct answers were seen, even taking into account the fact that follow through marks could be awarded from (b).

b.

Again this was a question that tested candidates and although many started only a very limited number were able to make significant progress. Part (a) was rarely done well with most candidates not understanding what was required. There was a little more success with part (b) but a number of candidates attempted methods that were not going to lead to anything meaningful. Most candidates did not understand what was required from part (c) and few correct answers were seen, even taking into account the fact that follow through marks could be awarded from (b).

c.



Bill buys two biased coins from a toy shop.

The shopkeeper claims that when one of the coins is tossed, the probability of obtaining a head is \(0.6\). Bill wishes to test this claim by tossing the coin \(250\) times and counting the number of heads obtained.

  (i)     State suitable hypotheses for this test.

  (ii)     He obtains \(140\) heads. Find the \(p\)-value of this result and determine whether or not it supports the shopkeeper’s claim at the \(5\%\) level of significance.

[6]
a.

Bill tosses the other coin a large number of times and counts the number of heads obtained. He correctly calculates a \(95\%\) confidence interval for the probability that when this coin is tossed, a head is obtained. This is calculated as [\(0.35199\), \(0.44801\)] where the end-points are correct to five significant figures.

Determine

  (i)     the number of times the coin was tossed;

  (ii)     the number of heads obtained.

[7]
b.

Markscheme

(i)     \({{\rm{H}}_0}:p = 0.6\) ; \({{\rm{H}}_1}:p \ne 0.6\)     A1A1

 

(ii)     EITHER

using a normal approximation, \(p\)-value \( = 0.197\)     A2

Note: Award A1 for \(0.0984\).

 

the shopkeeper’s claim is supported     A1

because \(0.197 > 0.05\)     R1

OR

using binomial distribution, \(p\)-value \( = 0.221\)     A2

Note: Award A1 for \(0.110\).

 

the shopkeeper’s claim is supported     A1

because \(0.221 > 0.05\)     R1

Note: Follow through the candidate’s \(p\)-value for A1R1.

Note: Accept \(p\)-values correct to two significant figures.

 

[6 marks]

a.

(i)     \(\hat p = \frac{{0.35199 + 0.44801}}{2} = 0.4\)     A1

width of CI \( = 3.92\sqrt {\frac{{0.4 \times 0.6}}{n}} \)    M1

\(3.92\sqrt {\frac{{0.4 \times 0.6}}{n}} = 0.44801 - 0.35199 = 0.096(02)\)     A1

solving,

\(n = {\left( {\frac{{3.92}}{{0.096(02)}}} \right)^2} \times 0.24\)     (M1)

\( = 400\)     A1

 

(ii)     \(x = n\widehat p = 400 \times 0.4 = 160\)     M1A1

 

[7 marks]

b.

Examiners report

Part (a) was well answered in general, using the calculator either to carry out a significance test on proportions or to find the \(p\)-value directly using the binomial distribution. Some candidates gave their conclusion in the form "Accept H0", this was not accepted since the question asked whether or not the shopkeeper’s claim was supported and a direct answer to this question was required.

a.

Part (b) caused problems for some candidates who were unsure how to proceed. Some candidates used a trial and error method which involved showing that \(\hat p = 0.4\) and then using their calculator to find the confidence interval for appropriate pairs of values for \(n\) and \(p\) until reaching \(400,160\). This was accepted as a valid method although it is not recommended as a general method since its success was based upon the value of \(n\) being one that would probably be tested.

b.



All members of a large athletics club take part in an annual shotput competition.

The following data give the distances achieved, in metres, by a random selection of 10 members of the club in the 2016 competition

11.8, 14.3, 13.8, 10.3, 14.9, 14.7, 12.4, 13.9, 14.0, 11.7

The president of the club wishes to test whether these data provide evidence that distances achieved have increased since the 2015 competition, when the mean result for the club was 12.4 m. You may assume that the distances achieved follow a normal distribution with mean \(\mu \), variance \({\sigma ^2}\), and that the membership of the club has not changed from 2015 to 2016.

State suitable hypotheses.

[1]
a.

(i)     Give a reason why a \(t\) test is appropriate and write down its degrees of freedom.

(ii)     Find the critical region for testing at each of the 5% and 10% significance levels.

[4]
b.

(i)     Find unbiased estimates of \(\mu \) and \({\sigma ^2}\).

(ii)     Find the value of the test statistic.

[3]
c.

State the conclusions that the president of the club should reach from this test, giving reasons for your answer.

[2]
d.

Markscheme

\({H_0}:{\text{ }}\mu  = 12.4;{\text{ }}{H_1}:{\text{ }}\mu  > 12.4\)    A1

[1 mark]

a.

(i)     \(t\) test is appropriate because the variance (standard deviation) is unknown     R1

\(v = 9\)    A1

(ii)     \(t \geqslant 1.83{\text{ }}(5\% );{\text{ }}t \geqslant 1.38{\text{ }}(10\% )\)     A1A1

 

Note:     Accept strict inequalities.

 

[4 marks]

b.

(i)     unbiased estimate of \(\mu \) is 13.18     A1

 

Note:     Accept 13.2.

 

unbiased estimate of \({\sigma ^2}\) is 2.34 \(({1.531^2})\)     A1

(ii)     \({t_{{\text{calc}}}} = \left( {\frac{{13.18 - 12.4}}{{\frac{{1.531}}{{\sqrt {10} }}}}} \right) = 1.61{\text{ or }}1.65\)     A1

[3 marks]

c.

as \(1.38 < 1.61 < 1.83\)     R1

evidence to accept \({H_0}\) at the 5% level, but not at the 10% level     A1

 

Note:     Accept the use of the \(p\)-value \( = 0.0708\).

 

[2 marks]

d.

Examiners report

Most candidates had an understanding of how to start the question, but only a small number were able to gain full marks. It appeared that many candidates were used to finding \(p\)-values, but showed a lack of understanding when asked to find the critical regions and test a \(t\)-value. The conclusions required in part (d) were often too brief and/or poorly expressed.

a.

Most candidates had an understanding of how to start the question, but only a small number were able to gain full marks. It appeared that many candidates were used to finding \(p\)-values, but showed a lack of understanding when asked to find the critical regions and test a \(t\)-value. The conclusions required in part (d) were often too brief and/or poorly expressed.

b.

Most candidates had an understanding of how to start the question, but only a small number were able to gain full marks. It appeared that many candidates were used to finding \(p\)-values, but showed a lack of understanding when asked to find the critical regions and test a \(t\)-value. The conclusions required in part (d) were often too brief and/or poorly expressed.

c.

Most candidates had an understanding of how to start the question, but only a small number were able to gain full marks. It appeared that many candidates were used to finding \(p\)-values, but showed a lack of understanding when asked to find the critical regions and test a \(t\)-value. The conclusions required in part (d) were often too brief and/or poorly expressed.

d.



Sami is undertaking market research on packets of soap powder. He considers the brand “Gleam”. The weight of the contents of a randomly chosen packet of “Gleam” follows a normal distribution with mean 750 grams and standard deviation 20 grams.

The weight of the packaging follows a different normal distribution with mean 40 grams and standard deviation 5 grams.

Find:

(i)     the probability that a randomly chosen packet of “Gleam” has a total weight exceeding 780 grams.

(ii)     the probability that the total weight of the contents of five randomly chosen packets of “Gleam” exceeds 3800 grams.

[8]
a.

Sami now considers the brand “Bright”. The weight of the contents of a randomly chosen packet of “Bright” follow a normal distribution with mean 650 grams and standard deviation 16 grams. Find the probability that the contents of six randomly chosen packets of “Bright” weigh more than the contents of five randomly chosen packets of “Gleam”.

[4]
b.

Markscheme

Note: In all parts accept answers which round to the correct 2sf answer.

 

(i)     contents: \(X \sim N(750,{\text{ }}400)\)

packaging: \(Y \sim N(40,{\text{ }}25)\)

consider \(X + Y\)     (M1)

\({\text{E}}(X + Y) = 790\)     A1

\({\text{Var}}(X + Y) = 425\)     A1

\({\text{P}}(X + Y > 780) = 0.686\)     A1

 

(ii)     Let \({X_1} + {X_2} + {X_3} + {X_4} + {X_5} = A\)     M1

\({\text{E}}(A) = 5{\text{E}}(X) = 3750\)     A1

\({\text{Var}}(A) = 5{\text{Var}}(X) = 2000\)     A1

\({\text{P}}(A > 3800) = 0.132\)     A1

Note: Condone the notation \(A = 5X\) if the variance is correct, M0 if not

a.

contents of Bright: \(B \sim N(650,{\text{ }}256)\)

let \(G = {B_1} + {B_2} + {B_3} + {B_4} + {B_5} + {B_6} - ({X_1} + {X_2} + {X_3} + {X_4} + {X_5})\)     M1

\({\text{E}}(G) = 6 \times 650 - 5 \times 750 = 150\)     A1

\({\text{Var}}(G) = 6 \times 256 + 5 \times 400 = 3536\)     A1

\({\text{P}}(G > 0) = 0.994\)     A1

Note: Condone the notation \(G = 6B - 5X\) if the variance is correct, M0 if not

b.

Examiners report

Part (a)(i) was well answered in general. In (a)(ii) and (b), however, many candidates made the fairly common error of confusing \(\sum\limits_{i = 1}^n {{X_i}} \) with \(nX\) which gives an incorrect variance. This is an important distinction which needs to be emphasized.

a.

Part (a)(i) was well answered in general. In (a)(ii) and (b), however, many candidates made the fairly common error of confusing \(\sum\limits_{i = 1}^n {{X_i}} \) with \(nX\) which gives an incorrect variance. This is an important distinction which needs to be emphasized.

b.



The random variables \(X\), \(Y\) follow a bivariate normal distribution with product moment correlation coefficient \(\rho \). The following table gives a random sample from this distribution.


 

(a)     Determine the value of \(r\), the product moment correlation coefficient of this sample.

(b)     (i)     Write down hypotheses in terms of \(\rho \) which would enable you to test whether or not \(X\) and \(Y\) are independent.

(ii)     Determine the p-value of the above sample and state your conclusion at the 5% significance level. Justify your answer.

(c)     (i)     Determine the equation of the regression line of \(y\) on \(x\).

(ii)     State whether or not this equation can be used to obtain an accurate prediction of the value of \(y\) for a given value of \(x\). Give a reason for your answer.

Markscheme

(a)     \(r =  - 0.163\)     A2

[2 marks]

 

(b)     (i)     \({{\text{H}}_0}:\rho  = 0:{{\text{H}}_1}:\rho  \ne 0\)     A1

(ii)     \(t = r\sqrt {\frac{{n - 2}}{{1 - {r^2}}}}  =  - 0.468 \ldots \)     (A1)

\({\text{DF}} = 8\)     (A1)

\(p{\text{-value}} = 2 \times 0.326 \ldots  = 0.652\)   A1

since \(0.652 > 0.05\), we accept \({{\text{H}}_0}\)     R1

 

Note: Award (A1)(A1)A0 if the p-value is given as \(0.326\) without prior working.

 

Note: Follow through their p-value for the R1.

 

[5 marks]

 

(c)     (i)     \(y =  - 0.257x + 5.22\)     A1

 

Note: Accept answers which round to \(–0.26\) and \(5.2\).

 

(ii)     no, because \(X\) and \(Y\) have been shown to be independent (or equivalent)     A1

[2 marks]

Examiners report

[N/A]



The following table shows the probability distribution of the discrete random variable \(X\).


 

(a)     Show that the probability generating function of \(X\) is given by

\[G(t) = \frac{{t{{(1 + t)}^2}}}{4}.\]

(b)     Given that \(Y = {X_1} + {X_2} + {X_3} + {X_4}\), where \({X_1},{\text{ }}{X_2},{\text{ }}{X_3},{\text{ }}{X_4}\) is a random sample from the distribution of \(X\),

(i)     state the probability generating function of \(Y\);

(ii)     hence find the value of \({\text{P}}(Y = 8)\).

Markscheme

(a)     \(G(t) = \frac{1}{4}t + \frac{1}{2}{t^2} + \frac{1}{4}{t^3}\)     M1A1

\( = \frac{{t{{(1 + t)}^2}}}{4}\)     AG

[2 marks]

 

(b)     (i)     \({\text{PGF of }} Y = {\left( {G(t)} \right)^4}\left( { = {{\left( {\frac{{t{{(1 + t)}^2}}}{4}} \right)}^4}} \right)\)     A1

(ii)     \({\text{P}}(Y = 8) = {\text{coefficient of }}{t^8}\)     (M1)

\( = \frac{{^8{{\text{C}}_4}}}{{256}}\)     (A1)

\( = \frac{{35}}{{128}}   (0.273)\)     A1

 

Note: Accept \(0.27\) or answers that round to \(0.273\).

 

[4 marks]

Examiners report

[N/A]



Let \({X_k}\) be independent normal random variables, where \({\rm{E}}({X_k}) = \mu \) and \(Var({X_k}) = \sqrt k \) , for \(k = 1,2, \ldots \) .

The random variable \(Y\) is defined by \(Y = \sum\limits_{k = 1}^6 {\frac{{{{( - 1)}^{k + 1}}}}{{\sqrt k }}} {X_k}\) .

(i)     Find \({\rm{E}}(Y)\) in the form \(p\mu \) , where \(p \in \mathbb{R}\) .

(ii)     Find \(k\) if \({\rm{Var}}({X_k}) < {\rm{Var}}(Y) < {\rm{Var}}({X_{k + 1}})\) .

[5]
a.

A random sample of \(n\) values of \(Y\) was found to have a mean of \(8.76\).

  (i)     Given that \(n = 10\) , determine a \(95\%\) confidence interval for \(\mu \) .

  (ii)     The width of the confidence interval needs to be halved. Find the appropriate value of \(n\) .

[6]
b.

Markscheme

(i)     \({\rm{E}}(Y) = \frac{1}{{\sqrt 1 }}\mu  - \frac{1}{{\sqrt 2 }}\mu  + \frac{1}{{\sqrt 3 }}\mu  - \frac{1}{{\sqrt 4 }}\mu  + \frac{1}{{\sqrt 5 }}\mu  - \frac{1}{{\sqrt 6 }}\mu \)     (M1)

\( = 0.409\) \((209)\mu \)     A1

Note: Accept answers which round to \(0.41\).

 

(ii)     \(Var(Y) = \frac{1}{{\sqrt 1 }} + \frac{1}{{\sqrt 2 }} + \frac{1}{{\sqrt 3 }} + \frac{1}{{\sqrt 4 }} + \frac{1}{{\sqrt 5 }} + \frac{1}{{\sqrt 6 }}\)     (M1) 

\( = 3.64\)   \((3.6399 \ldots )\)     A1

\(({\rm{Var}}({X_{13}}) = 3.61;{\rm{Var}}({X_{14}}) = 3.74) \Rightarrow k = 13\)     A1

 

[5 marks]

a.

(i)     \(95\%\) CI for is \({\rm{E}}(Y)\)

\(8.76 \pm 1.96\sqrt {\frac{{3.6399 \ldots }}{{10}}} \)     (M1)

\( = \left[ {7.58,9.94} \right]\)     A1A1

Note: Accept \(\left[ {7.6,9.9} \right]\) . Do not penalize answers given to more than 3sf.

 

Since \(\mu  = \frac{{{\rm{E}}(Y)}}{{0.409 \ldots }}\) , CI for \(\mu \) is \(\left[ {18.5,24.3} \right]\)     A1

Note: Do not penalize answers given to more than 3sf.

 

(ii)     width of a CI is inversely proportional to the square root of \(n\)     (M1)

so \(n = 40\)     A1

 

[6 marks]

b.

Examiners report

This question proved to be the most difficult on the paper and few fully correct answers were seen. In part (a) (i) many candidates did know how to find the answers in terms of \(\mu\). Very few candidates successfully completed part (a) (ii).

a.

This question proved to be the most difficult on the paper and few fully correct answers were seen. In part b) (i) a number of candidates made some progress, but few realised or knew how to convert the confidence interval for \({\rm{E}}(Y)\ into a confidence interval for \(\mu\). For those who persevered to the end of the question, there was a reasonable degree of success in part (b) (ii).

b.



The weights, in grams, of 10 apples were measured with the following results:

     \(212.2\)     \(216.9\)     \(209.0\)     \(215.5\)     \(215.9\)     \(213.5\)     \(208.9\)     \(213.8\)     \(216.4\)     \(209.9\)

You may assume that this is a random sample from a normal distribution with mean \(\mu \) and variance \({\sigma ^2}\).

(a)     Giving all your answers correct to four significant figures,

(i)     determine unbiased estimates for \(\mu \) and \({\sigma ^2}\);

(ii)     find a \(95\%\) confidence interval for \(\mu \).

Another confidence interval for \(\mu \), \([211.5, 214.9]\), was calculated using the above data.

(b)     Find the confidence level of this interval.

Markscheme

(a)     (i)     \(\bar x = {\text{213.2}}\)     A1

\(s = 3.0728 \ldots \)     (A1)

\({s^2} = 9.442\)     A1

 

(ii)     \([211.0, 215.4]\)     A1A1

 

Note: Accept \(211\) in place of \(211.0\).

 

Note: Apart from the above note, accept any answers which round to the correct 4 significant figure answers.

 

[5 marks]

 

(b)     use of the fact that the width of the interval is \(2t \times \frac{s}{{\sqrt n }}\)     (A1)

so that \(3.4 = 2t \times \frac{{3.0728 \ldots }}{{\sqrt {10} }}\)     M1

\(t = 1.749\)     A1

degrees of freedom \( = 9\)     (A1)

\({\text{P}}(T > 1.749) = 0.0571\)     (M1)

confidence level \( = 1 - 2 \times 0.0571 = 0.886{\text{ }}(88.6\% )\)     A1

 

Note: Award the \({\text{DF}} = 9\) (A1) mark if the following line has \(0.00337\) on the RHS.

 

Note: Accept any answer which rounds to \(88.6\%\).

 

[6 marks]

Examiners report

[N/A]



A sample of size 100 is taken from a normal population with unknown mean μ and known variance 36.

Another investigator decides to use the same data to test the hypotheses H0 : μ = 65 , H1 : μ = 67.9.

An investigator wishes to test the hypotheses H0 : μ = 65, H1 : μ > 65.

He decides on the following acceptance criteria:

Accept H0 if the sample mean \(\bar x\) ≤ 66.5

Accept H1 if \(\bar x\) > 66.5

Find the probability of a Type I error.

[3]
a.

She decides to use the same acceptance criteria as the previous investigator. Find the probability of a Type II error.

[3]
b.i.

Find the critical value for \({\bar x}\) if she wants the probabilities of a Type I error and a Type II error to be equal.

[3]
b.ii.

Markscheme

\(\bar X \sim {\text{N}}\left( {\mu ,\,\frac{{{\sigma ^2}}}{n}} \right)\)

\(\bar X \sim {\text{N}}\left( {65,\,\frac{{36}}{{100}}} \right)\)     (A1)

P(Type I Error) \( = {\text{P}}\left( {\bar X > 66.5} \right)\)      (M1)

= 0.00621       A1

[3 marks]

a.

P(Type II Error) = P(accept H0 | H1 is true)

\( = {\text{P}}\left( {\bar X \leqslant 66.5\left| {\mu  = 67.9} \right.} \right)\)        (M1)

\( = {\text{P}}\left( {\bar X \leqslant 66.5} \right)\) when \(\bar X \sim {\text{N}}\left( {67.9,\,\frac{{36}}{{100}}} \right)\)        (M1)

= 0.00982      A1

[3 marks]

b.i.

the variances of the distributions given by H0 and H1 are equal,       (R1)

by symmetry the value of \({\bar x}\) lies midway between 65 and 67.9      (M1)

\( \Rightarrow \bar x = \frac{1}{2}\left( {65 + 67.9} \right) = 66.45\)       A1

[3 marks]

b.ii.

Examiners report

[N/A]
a.
[N/A]
b.i.
[N/A]
b.ii.